Non-Strict Cache Coherence: Exploiting Data-Race Tolerance in Emerging Applications

نویسندگان

Siddhartha V. Tambat

Sriram Vajapeyam

چکیده

Software distributed shared memory (DSM) platforms on networks of workstations tolerate large network latencies by employing one of several weak memory consistency models. Data-race tolerant applications, such as Genetic Algorithms (GAs), Probabilistic Inference, etc., offer an additional degree of freedom to tolerate network latency: they do not synchronize shared memory references, and behave correctly when supplied outdated shared data. However, these algorithms often have a high communication-to-computationratio and can flood the network with messages in the presence of large message delays. We explore the benefits of designing a DSM with non-strict cache coherence for such applications. We study the performance of controlled asynchronous implementations of these algorithms via the use of a previously proposed blocking Global Read memory access primitive. Global Read implements non-strict cache coherence by guaranteeing to return to the reader a shared datum value from within a specified staleness range; synchronization primitives are thereby avoided. As compared to fully asynchronous implementations, controlled (i.e. partial) asynchrony, implemented using Global Read, reduces the overall amount of computation done with stale data by a process, thus controlling the amount of shared updates (and thereby the network traffic) generated. Experiments on an IBM SP2 multicomputer with an Ethernet interconnect show significant performance improvements for controlled asynchronous implementations. On a lightly loaded network, most of the GA benchmarks see 30% to 40% improvement over the best competitor across configurations ranging from 2 to 16 processors, while two of the Probabilistic Inference benchmarks see more than 80% improvement on a 2-node configuration. As the network load increases, the benefits of non-strict coherence and partial asynchrony increase significantly. Overall, nonstrict cache coherence is indicated to be significantly beneficial over both the data-race-free based weak consistency memory models and fully asynchronous models that have no guarantees regarding coherence. fA shorter version of this document was published in the 29th International Conference on Parallel Processing (ICPP-2000) held in Toronto, Canada, August 21-24, 2000.g

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting the Benefits of Multiple-Path Network DSM Systems: Architectural Alternatives and Performance Evaluation

| Modern high performance networks being used for scalable distributed shared memory (DSM) systems support multiple paths to increase bandwidth and/or reduce contention. Such networks violate the constraint of pairwise in-order message delivery implicitly required by many existing directory-based cache coherence protocols. To solve this problem, two alternative strategies are currently used by ...

متن کامل

Exploiting Data Locality in Adaptive Architectures

The speed of processors increases much faster than the memory access time. This makes memory accesses expensive. To meet this problem, cache hierarchies are introduced to serve the processor with data. However, the effectiveness of caches depends on the amount of locality in the application’s memory access pattern. The behavior of various programs differs greatly in terms of cache miss characte...

متن کامل

Soft Coherence: Preliminary Experiments with Error-Tolerant Cache Coherence in Numerical Applications

As we scale into the multi-core era, we face severe challenges in the scalability and performance of on-chip cache-coherent shared memory mechanisms. We explore application error-tolerance as an extra degree of freedom to meet these challenges. Iterative numerical algorithms, in particular, can cope with the occasional stale value with little or no effect on accuracy or convergence time. We exp...

متن کامل

TachoRace: Exploiting Performance Counters for Run-Time Race Detection

Fixing data races is a difficult parallel programming problem, even for experienced programmers. At the moment, dynamic race detectors are frequently used because they find races more reliably than other approaches; however, the dynamic approach significantly influences application behavior during debugging because all thread’s memory accesses need to be monitored. Despite using such detectors ...

متن کامل

Predicting Data Cache Misses in Non - Numeric

To maximize the beneet and minimize the overhead of software-based latency tolerance techniques, we would like to apply them precisely to the set of dynamic references that suuer cache misses. Unfortunately , the information provided by the state-of-the-art cache miss prooling technique (summary prooling) is inadequate for references with intermediate miss ratios|it results in either failing to...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2000

Non-Strict Cache Coherence: Exploiting Data-Race Tolerance in Emerging Applications

نویسندگان

چکیده

منابع مشابه

Exploiting the Benefits of Multiple-Path Network DSM Systems: Architectural Alternatives and Performance Evaluation

Exploiting Data Locality in Adaptive Architectures

Soft Coherence: Preliminary Experiments with Error-Tolerant Cache Coherence in Numerical Applications

TachoRace: Exploiting Performance Counters for Run-Time Race Detection

Predicting Data Cache Misses in Non - Numeric

عنوان ژورنال:

اشتراک گذاری